ggml-cuda: add mem check for fusion#19916
Conversation
| // Sanitize NaN to -FLT_MAX so the iterative argmax produces unique expert IDs. | ||
| // NaN comparisons always return false, which would cause the same expert to be | ||
| // selected repeatedly. -FLT_MAX compares normally and is still excluded by the | ||
| // -INFINITY sentinel used after each selection round. | ||
| // More relevant for the cuBLAS path. See https://github.com/ggml-org/llama.cpp/issues/19659 | ||
| #pragma unroll | ||
| for (int i = 0; i < experts_per_thread; i++) { | ||
| if (__isnanf(wt[i])) { | ||
| wt[i] = -FLT_MAX; | ||
| } | ||
| } |
There was a problem hiding this comment.
- If the issue is in llama.cpp and not cuBLAS, I feel we should use
fmaxfas a NaN-safe comparator: https://docs.nvidia.com/cuda/cuda-math-api/cuda_math_api/group__CUDA__MATH__SINGLE.html#_CPPv45fmaxfff (I presume we are talking aboutval_s > max_val_slater on in this kernel?) - If the issue is in cuBLAS, I'd love more details so I can ask the cuBLAS team/take a look myself
There was a problem hiding this comment.
- Yes but it's not just
val_s > max_val_sit'sval_s > max_val_s || (val_s == max_val_s && expert < max_expert) - The linked issue has a repro. It's cuBLAS + Nemotron, so think it would be fun for you guys to look at :)
There was a problem hiding this comment.
Yes but it's not just val_s > max_val_s it's val_s > max_val_s || (val_s == max_val_s && expert < max_expert)
Shouldn't we be fine with fmaxf, so long as max_val & max_val_s are initialized to -FLT_MAX instead of -INFINITY at the beginning of the selection-loop over n_expert_used? At least for the case where k non-NAN values exist inside the logits for a given row. But at this point we are just pulling your proposal into the loop itself 😄
|
@JohannesGaessler can you review this PR? Apart from the NaN check it also fixes a latent bug |
| if ((b_start <= a_start && a_start < b_end) || (a_start <= b_start && b_start < a_end)) { | ||
| return true; | ||
| } | ||
|
|
||
| return false; |
There was a problem hiding this comment.
| if ((b_start <= a_start && a_start < b_end) || (a_start <= b_start && b_start < a_end)) { | |
| return true; | |
| } | |
| return false; | |
| return (b_start <= a_start && a_start < b_end) || (a_start <= b_start && b_start < a_end); |
This would maybe be slightly simpler but either way is fine I think.
* ggml-cuda: add mem check for fusion * Replace NaNs with -FLT_MAX * fix typo Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* ggml-cuda: add mem check for fusion * Replace NaNs with -FLT_MAX * fix typo Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Fixes #19659